我们考虑使用随机球形代码的高维信号$ x $的有损压缩表示之间的分布连接,并在添加白色高斯噪声(AWGN)下的$ X $观察$ x $。我们展示了比特率 - $ R $压缩版的Wassersein距离$ x $及其在AWGN-噪声比率下的AWGN噪声比率下的观察2 ^ {2R} -1 $ 2 ^ {2r} -1 $中的下线性。我们利用此事实基于AWGN损坏的$ x $的AWGN损坏版本的估算者的风险连接到与其比特率 - $ r $量化版本相同的估算器所获得的风险。我们通过在压缩约束下导出推导问题的各种新结果来展示这种联系的有用性,包括Minimax估计,稀疏回归,压缩感和远程源编码中的线性估计的普遍性。
translated by 谷歌翻译
我们适应更高的批评(HC)拟合良好测试,以测量字频表之间的近距离。我们将这条措施应用于作者归因挑战,目标是使用其他文档来识别文档的作者。该方法简单但在没有手工和调谐的情况下表现良好;在各种当前挑战中报告最新状态的准确性。作为固有的副作用,HC计算标识了识别字的子集。在实践中,所确定的单词跨属于同质作者的语料库的文件方差低。我们得出结论,在比较新文件的相似性和单个作者的语料库中,HC大多受作者的特征的影响,并且相对不受主题结构的影响。
translated by 谷歌翻译
Position modeling plays a critical role in Transformers. In this paper, we focus on length extrapolation, i.e., training on short texts while evaluating longer sequences. We define attention resolution as an indicator of extrapolation. Then we propose two designs to improve the above metric of Transformers. Specifically, we introduce a relative position embedding to explicitly maximize attention resolution. Moreover, we use blockwise causal attention during inference for better resolution. We evaluate different Transformer variants with language modeling. Experimental results show that our model achieves strong performance in both interpolation and extrapolation settings. The code will be available at https://aka.ms/LeX-Transformer.
translated by 谷歌翻译
Despite their widespread adoption, neural conversation models have yet to exhibit natural chat capabilities with humans. In this research, we examine user utterances as causes and generated responses as effects, recognizing that changes in a cause should produce a different effect. To further explore this concept, we have compiled and expanded upon a new dataset called CausalDialogue through crowd-sourcing. This dataset includes multiple cause-effect pairs within a directed acyclic graph (DAG) structure. Our analysis reveals that traditional loss functions can struggle to effectively incorporate the DAG structure, leading us to propose a causality-enhanced method called Exponential Maximum Average Treatment Effect (ExMATE) to enhance the impact of causality at the utterance level in training neural conversation models. To evaluate the effectiveness of this approach, we have built a comprehensive benchmark using the CausalDialogue dataset leveraging large-scale pre-trained language models, and have assessed the results through both human and automatic evaluation metrics for coherence, diversity, and agility. Our findings show that current techniques are still unable to effectively address conversational DAGs, and that the ExMATE method can improve the diversity and agility of conventional loss functions while maintaining coherence.
translated by 谷歌翻译
Business processes that involve AI-powered automation have been gaining importance and market share in recent years. These business processes combine the characteristics of classical business process management, goal-driven chatbots, conversational recommendation systems, and robotic process automation. In the new context, prescriptive process monitoring demands innovative approaches. Unfortunately, data logs from these new processes are still not available in the public domain. We describe the main challenges in this new domain and introduce a synthesized dataset that is based on an actual use case of intelligent process automation with chatbot orchestration. Using this dataset, we demonstrate crowd-wisdom and goal-driven approaches to prescriptive process monitoring.
translated by 谷歌翻译
Out-of-distribution (OOD) detection has attracted a large amount of attention from the machine learning research community in recent years due to its importance in deployed systems. Most of the previous studies focused on the detection of OOD samples in the multi-class classification task. However, OOD detection in the multi-label classification task remains an underexplored domain. In this research, we propose YolOOD - a method that utilizes concepts from the object detection domain to perform OOD detection in the multi-label classification task. Object detection models have an inherent ability to distinguish between objects of interest (in-distribution) and irrelevant objects (e.g., OOD objects) on images that contain multiple objects from different categories. These abilities allow us to convert a regular object detection model into an image classifier with inherent OOD detection capabilities with just minor changes. We compare our approach to state-of-the-art OOD detection methods and demonstrate YolOOD's ability to outperform these methods on a comprehensive suite of in-distribution and OOD benchmark datasets.
translated by 谷歌翻译
We present the UC$^3$RL algorithm for regret minimization in Stochastic Contextual MDPs (CMDPs). The algorithm operates under the minimal assumptions of realizable function class, and access to offline least squares and log loss regression oracles. Our algorithm is efficient (assuming efficient offline regression oracles) and enjoys an $\widetilde{O}(H^3 \sqrt{T |S| |A|(\log (|\mathcal{F}|/\delta) + \log (|\mathcal{P}|/ \delta) )})$ regret guarantee, with $T$ being the number of episodes, $S$ the state space, $A$ the action space, $H$ the horizon, and $\mathcal{P}$ and $\mathcal{F}$ are finite function classes, used to approximate the context-dependent dynamics and rewards, respectively. To the best of our knowledge, our algorithm is the first efficient and rate-optimal regret minimization algorithm for CMDPs, which operates under the general offline function approximation setting.
translated by 谷歌翻译
We address the general task of structured commonsense reasoning: given a natural language input, the goal is to generate a graph such as an event -- or a reasoning-graph. To employ large language models (LMs) for this task, existing approaches ``serialize'' the output graph as a flat list of nodes and edges. Although feasible, these serialized graphs strongly deviate from the natural language corpora that LMs were pre-trained on, hindering LMs from generating them correctly. In this paper, we show that when we instead frame structured commonsense reasoning tasks as code generation tasks, pre-trained LMs of code are better structured commonsense reasoners than LMs of natural language, even when the downstream task does not involve source code at all. We demonstrate our approach across three diverse structured commonsense reasoning tasks. In all these natural language tasks, we show that using our approach, a code generation LM (CODEX) outperforms natural-LMs that are fine-tuned on the target task (e.g., T5) and other strong LMs such as GPT-3 in the few-shot setting.
translated by 谷歌翻译
近年来,出于计算机视觉目的,将图像传输到远程服务器的传输急剧增加。在许多应用程序(例如监视)中,图像主要是用于自动分析的,并且很少被人类看到。在这种情况下,使用传统的压缩在比特率方面效率低下,这可能是由于关注基于人类的失真指标。因此,重要的是创建特定的图像编码方法,以供人类和机器联合使用。创建这种编解码器的机器侧的一种方法是在深神经网络中执行某些中间层执行机器任务的功能匹配。在这项工作中,我们探讨了用于培训人类和机器可学习的编解码器时所使用的层选择的效果。我们证明,使用数据处理不平等,从速率延伸的意义上讲,更深层的匹配特征是可取的。接下来,我们通过重新培训现有的可扩展人机编码模型来从经验上确认我们的发现。在我们的实验中,我们显示了这种可扩展模型的人类和机器方面的权衡,并讨论了在这方面使用更深层进行训练的好处。
translated by 谷歌翻译
我们介绍了IST和Unmabel对WMT 2022关于质量估计(QE)的共享任务的共同贡献。我们的团队参与了所有三个子任务:(i)句子和单词级质量预测;(ii)可解释的量化宽松;(iii)关键错误检测。对于所有任务,我们在彗星框架之上构建,将其与OpenKIWI的预测估计架构连接,并为其配备单词级序列标记器和解释提取器。我们的结果表明,在预处理过程中合并参考可以改善下游任务上多种语言对的性能,并且通过句子和单词级别的目标共同培训可以进一步提高。此外,将注意力和梯度信息结合在一起被证明是提取句子级量化量化宽松模型的良好解释的首要策略。总体而言,我们的意见书在几乎所有语言对的所有三个任务中都取得了最佳的结果。
translated by 谷歌翻译